Powered by Rmarkdown.
NCT_ID →(JensenLab:Tagger)→ DOID
NCT_ID →(AACT)→ MeSH
NCT_ID →(NextMove:LeadMine)→ SMILES
SMILES →(PubChem)→ CID
CID →(PubChem)→ INCHIKEY
INCHIKEY →(ChEMBL)→ MOLECULE_CHEMBL_ID
MOLECULE_CHEMBL_ID →(ChEMBL)→ ACTIVITY_ID
ACTIVITY_ID →(ChEMBL)→ TARGET_CHEMBL_ID
TARGET_CHEMBL_ID →(ChEMBL)→ COMPONENT_ID
COMPONENT_ID →(ChEMBL)→ ACCESSION
ACTIVITY_ID →(ChEMBL)→ DOCUMENT_CHEMBL_ID
DOCUMENT_CHEMBL_ID →(ChEMBL)→ PUBMED_ID
aact_studies.tsvaact_drugs.tsvaact_descriptions.tsvaact_drugs_leadmine.tsvaact_drugs_smi_pubchem_cid.tsvaact_drugs_smi_pubchem_cid2ink.tsvaact_drugs_ink2chembl.tsvaact_drugs_chembl_activity_pchembl.tsvaact_drugs_chembl_target_component.tsvpharos_targets.tsvaact_descriptions_tagger_matches.tsvdiseases_entities.tsv
nct_idis the study ID.
## [1] "Wed Apr 10 14:38:49 2019"
library(readr)
library(data.table)
library(stringr)
library(plotly, quietly=T)
Read file of all studies in AACT.
## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"
Reference type results_reference may offer greater evidence, confidence.
## [1] "references: 388031; NCT_IDs: 61208; PMIDs: 287758; results_references: 64880"
Read file of all drugs in AACT.
id is AACT ID.## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"
Select only Interventional studies (study_type) associated with drugs (via NCT_ID).
## [1] "Interventional studies: 237892 (79.2%)"
## [1] "Interventional drug studies: 124421 ; unique NCT_IDs: 124421"
| phase | N_studies | N_drugs |
|---|---|---|
| Early Phase 1 | 1574 | 2615 |
| Phase 1 | 23603 | 48593 |
| Phase 1/Phase 2 | 6663 | 13288 |
| Phase 2 | 33910 | 68850 |
| Phase 2/Phase 3 | 3305 | 6503 |
| Phase 3 | 22988 | 49507 |
| Phase 4 | 19593 | 36331 |
| NA | 12785 | 29390 |
| overall_status | N_studies | N_drugs |
|---|---|---|
| Active, not recruiting | 6420 | 13962 |
| Completed | 72053 | 145006 |
| Enrolling by invitation | 638 | 1060 |
| Not yet recruiting | 4138 | 8001 |
| Recruiting | 16723 | 33973 |
| Suspended | 463 | 945 |
| Terminated | 10138 | 19618 |
| Unknown status | 10106 | 18463 |
| Withdrawn | 3742 | 6969 |
## Warning: Ignoring 1 observations
## Warning: Ignoring 1 observations
AACT drug names resolved to standard names and structures via SMILES. Note that one name may include multiple chemicals. Now we can use cheminformatically rigorous counts for drugs as active pharmaceutical ingredients (APIs).
## [1] "Drug unique SMILES resolved by LeadMine: 4699 ; unique intervention IDs: 171741"
| smi2img | N_mentions | names |
|---|---|---|
| 2637 | Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol | |
| 2545 | CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide; ciclophosphamide; cyclophosphamide | |
| 2461 | CISPLATIN; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum; cis-platinum; cisplatin; cisplatine; cisplatinum | |
| 2070 | DEXAMETHASONE; Dexamethason; Dexamethasone; Dexamethosone; Maxitrol; OZURDEX; Oradexon; Ozurdex; dexamethason; dexamethasone; dexamethosone | |
| 2054 | CARBOPLATIN; Carboplatin; Carboplatine; Paraplatin; carboplatin; carboplatine | |
| 1779 | DOCETAXEL; Docetaxel; docetaxel | |
| 1625 | METFORMIN; MetFORMIN; Metformin; Metformine; metformin; metformine | |
| 1540 | GEMCITABINE; Gemcitabine; gemcitabine | |
| 1342 | CAPECITABINE; Capecitabin; Capecitabine; XELODA; Xeloda; capecitabine; xeloda | |
| 1178 | Cortancyl; Lodotra; Meticorten; Prednison; Prednisone; RAYOS; prednison; prednisone | |
| 1157 | 0xaliplatin; Eloxatin; OXALIPLATIN; OXAliplatin; Oxaliplatin; Oxaliplatine; eloxatin; oxaliplatin; oxaliplatine | |
| 1157 | METHOTREXATE; Methotrexate; Metoject; methotrexate | |
| 1086 | BUPIVACAINE; Bupivacain; Bupivacaine; EXPAREL; Exparel; SKY0402; bupivacain; bupivacaine | |
| 1044 | ETOPOSIDE; Etoposid; Etoposide; etoposide | |
| 1027 | ADOPORT; ADVAGRAF; Adoport; Advagraf; ENVARSUS; Envarsus; FK-506; FK506; PROGRAF; Prograf; Protopic; TACROLIMUS; Tacrolimus; tacrolimus | |
| 978 | NORMAL SALINE; Normal Saline; Normal saline; normal salin; normal saline | |
| 977 | LIDOCAINE; LMX 4; LMX4; Lidocain; Lidocaine; Lidoderm; Lignocain; Lignocaine; Oraqix; lidocain; lidocaine; lignocaine | |
| 908 | CYTARABINE; Cytarabine; Cytosar; DepoCyt; DepoCyte; Depocyt; Depocyte; cytarabine; cytosar | |
| 903 | COPEGUS; Copegus; REBETOL; RIBAVIRIN; Rebetol; Ribasphere; Ribavarin; Ribavirin; Ribavirine; Virazole; rebetol; ribavarin; ribavirin | |
| 846 | Diprivan; PROPOFOL; Propofol; propofol |
## [1] "Drugs (drug names) with resolved structure: 180555 / 197300 (91.5%)"
## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"
## [1] "Mentions by study: 92966 / 99647 (93.3%)"
## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"
## [1] "PubChem SMILES2CID hits: 3933 / 4540 (86.6%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153342"
## [1] "PubChem CIDs with InChIKeys: 3783"
For Target Development Level (TDL) and other metadata.
Perhaps should instead use PubChem CIDs and UniChem.
## [1] "ChEMBL compounds mapped via InChIKeys: 3316"
Select only activities with pChembl values for confidence.
## [1] "ChEMBL activities: 124438"
## [1] "ChEMBL activities molecules: 2287 ; targets: 3832 ; documents: 16198"
## [1] "ChEMBL target proteins: 3157"
## [1] "ChEMBL target proteins mapped to TCRD (human): 1805"
## [1] "Organisms: 187"
| organism | N_targets | Types |
|---|---|---|
| Homo sapiens | 1806 | CHIMERIC PROTEIN; PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; PROTEIN-PROTEIN INTERACTION; SELECTIVITY GROUP; SINGLE PROTEIN |
| Rattus norvegicus | 529 | PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SELECTIVITY GROUP; SINGLE PROTEIN |
| Mus musculus | 238 | CHIMERIC PROTEIN; PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SINGLE PROTEIN |
| Bos taurus | 98 | PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SINGLE PROTEIN |
| Sus scrofa | 36 | PROTEIN COMPLEX; PROTEIN FAMILY; SINGLE PROTEIN |
| Cavia porcellus | 26 | SINGLE PROTEIN |
| Escherichia coli K-12 | 19 | PROTEIN COMPLEX; PROTEIN FAMILY; SINGLE PROTEIN |
| Oryctolagus cuniculus | 18 | SINGLE PROTEIN |
| Escherichia coli | 17 | PROTEIN COMPLEX; SINGLE PROTEIN |
| Mycobacterium tuberculosis | 17 | SINGLE PROTEIN |
## [1] "Human targets: 1806"
| idgFamily | N |
|---|---|
| Kinase | 405 |
| Enzyme | 330 |
| GPCR | 158 |
| Non-IDG | 120 |
| Ion Channel | 64 |
| Transporter | 53 |
| Epigenetic | 35 |
| Nuclear Receptor | 28 |
| Transcription Factor | 20 |
| TF/Epigenetic | 3 |
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"
## [1] " Tchem: 733" " Tclin: 341" " Tbio: 140"
## [4] " Tdark: 2"
With JensenLab DOID entities dictionary. On descriptions from detailed_descriptions table.
serialno corresponds with DOID.id is AACT primary key.Likely false positives, manually removed:
| doid | N_mentions | terms |
|---|---|---|
| DOID:162 | 28596 | CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; malignant Tumor; malignant neoplasm; malignant tumor; primary cancer |
| DOID:9351 | 17274 | DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus; diabetes mellitus; diabetes-mellitus |
| DOID:6713 | 16632 | CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE; Stroke; cerebro- vascular disease; cerebro-vascular disease; cerebrovascul… |
| DOID:2030 | 12084 | ANXIETY; Anxiety; Anxiety Disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; anxiety syndrome; anxiety-state |
| DOID:1612 | 10583 | BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer; breast Cancer; breast caNcEr; breast cancer; breast tumor; breast-canc… |
| DOID:2841 | 10021 | ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper reactivity; bronchial hyper-reactivity; bronchial hyperreactivity; … |
| DOID:3083 | 9782 | CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive Lung disease; Chronic Obstructive Pulmonary Disease; Chronic Obstructive Pulmonary dis… |
| DOID:9970 | 9303 | OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity |
| DOID:10763 | 9144 | HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high blood Pressure; high blood pressure; high blood-pressure; htn; hyper-… |
| DOID:3393 | 6816 | C-HD; CAD; CHD; CORONARY ARTERY DISEASE; CORONARY SYNDROME; CORONARY syndrome; ChD; Coronary ARtery DIsease; Coronary Artery Disease; Coronary Disease; Coronary Heart Disease; Coronary Heart diseas… |
| DOID:0060145 | 6115 | ANALGESIA; Analgesia; analgeSia; analgesia |
| DOID:9352 | 5848 | Diabetes Mellitus Type 2; Diabetes Mellitus Type II; Diabetes Mellitus type 2; Diabetes Mellitus, Type II; Diabetes mellitus Type 2; Diabetes mellitus non-insulin-dependent; Diabetes mellitus type … |
| DOID:10283 | 5056 | Familial Prostate Cancer; HPC; PRostate Cancer; Prostate CAncer; Prostate Cancer; Prostate cancer; Prostatic cancer; hereditary prostate cancer; prostate Cancer; prostate cancer; prostate-cancer; p… |
| DOID:8469 | 4985 | FLU; Flu; Influenza; flu; influenza |
| DOID:225 | 4962 | SYNDROME; Syndrome; syn drome; syndrome |
| DOID:3908 | 4959 | NSCLC; Non Small Cell Lung Cancer; Non Small Cell Lung Carcinoma; Non Small Cell Lung cancer; Non small cell lung cancer; Non small-cell lung cancer; Non- small cell lung cancer; Non-Small Cell Lun… |
| DOID:784 | 4841 | CKD; CKF; CRD; CRF; Chronic Kidney Disease; Chronic Kidney disease; Chronic Kidney failure; Chronic Renal Disease; Chronic kidney disease; Chronic kidney failure; Chronic renal disease; chronic Kid… |
| DOID:5419 | 4689 | SCHIZOPHRENIA; Schizophrenia; schizophrenia |
| DOID:684 | 3836 | HCC; HEPATOCELLULAR CARCINOMA; Hepatocellular Carcinoma; Hepatocellular carcinoma; Hepatoma; hcc; hepato-cellular carcinoma; hepatocellular Carcinoma; hepatocellular carcinoma; hepatoma |
| DOID:5844 | 3664 | Heart Attack; Heart attack; MYOCARDIAL INFARCTION; Myocardial Infarct; Myocardial Infarction; Myocardial infarct; Myocardial infarction; heart attack; myo-cardial infarction; myocardiaL infARction;… |
Sort synonyms terms by frequency.
| nct_id | doid | N_mentions | disease_terms |
|---|---|---|---|
| NCT00006507 | DOID:2030 | 1 | Anxiety |
| NCT01081626 | DOID:5223 | 2 | infertility |
| NCT01081626 | DOID:11612 | 2 | polycystic ovarian syndrome;PCOS |
| NCT01081626 | DOID:3781 | 1 | anovulation |
| NCT01081626 | DOID:3459 | 1 | breast carcinoma |
| NCT01081626 | DOID:2394 | 1 | ovarian cancer |
| NCT01225861 | DOID:162 | 1 | cancer |
| NCT01987518 | DOID:6457 | 1 | Cowden Disease |
| NCT01987518 | DOID:3852 | 1 | Peutz Jeghers Disease |
| NCT02229344 | DOID:8577 | 1 | ulcerative colitis |
| NCT02492334 | DOID:1574 | 7 | AUD;alcohol use disorder |
| NCT02492334 | DOID:2030 | 2 | anxiety |
| NCT02492334 | DOID:2055 | 1 | posttraumatic stress disorder |
| NCT02524327 | DOID:162 | 1 | cancer |
| NCT02659930 | DOID:0111152 | 8 | MCD;multicentric Castleman disease |
| NCT02659930 | DOID:0060058 | 2 | lymphoma |
| NCT02659930 | DOID:635 | 2 | AIDS |
| NCT02659930 | DOID:225 | 3 | syndrome;Syndrome |
| NCT02659930 | DOID:8632 | 2 | Kaposi Sarcoma;Kaposi sarcoma |
| NCT03225716 | DOID:0060058 | 1 | lymphoma |
| NCT03225716 | DOID:0050746 | 2 | mantle cell lymphoma;MCL |
| NCT03225716 | DOID:0050745 | 1 | diffuse large B-cell lymphoma |
| NCT03225716 | DOID:1040 | 2 | chronic lymphocytic leukemia;CLL |
| NCT03225716 | DOID:1039 | 1 | prolymphocytic leukemia |
| NCT03225716 | DOID:707 | 1 | B-cell lymphoma |
| NCT03875781 | DOID:1993 | 2 | rectal cancer |
| NCT03875781 | DOID:162 | 1 | cancer |
And include references.
NCT_ID)Keep only studies including both disease and drug mentions.
## [1] "studies linked to 1+ drugs AND 1+ diseases: 36202"
ACTIVITY_ID →(ChEMBL)→ TARGET_CHEMBL_ID
TARGET_CHEMBL_ID →(ChEMBL)→ COMPONENT_ID
COMPONENT_ID →(ChEMBL)→ ACCESSION
## [1] "ACTIVITY_IDs: 124438 ; TARGET_CHEMBL_IDs: 3832 ; pairs: 124438"
## [1] "COMPONENT_IDs: 2535 ; TARGET_CHEMBL_IDs: 2481 ; pairs: 3157"
## [1] "ACCESSIONs: 2535 ; SINGLE_PROTEIN ACCESSIONs: 2183"
CID →(PubChem)→ INCHIKEY
INCHIKEY →(ChEMBL)→ MOLECULE_CHEMBL_ID
MOLECULE_CHEMBL_ID →(ChEMBL)→ ACTIVITY_ID
ACTIVITY_ID →(ChEMBL)→ DOCUMENT_CHEMBL_ID
## [1] "CIDs: 3783 ; INCHIKEYs: 3781 ; pairs: 3783"
## [1] "INCHIKEYs: 3314 ; MOLECULE_CHEMBL_IDs: 3314 ; pairs: 3316"
## [1] "MOLECULE_CHEMBL_IDs: 2287 ; TARGET_CHEMBL_IDs: 3832 ; ACTIVITY_IDs: 124438 ; DOCUMENT_CHEMBL_IDs: 16198"
DOCUMENT_CHEMBL_ID →(ChEMBL)→ PUBMED_ID
## [1] "DOCUMENT_CHEMBL_IDs:: 16198 ; PMIDs: 15193"